This notebooks will help me talk through a bit of material regarding
spatial data, with some introduction to
spatial joins.
Key topics for today:
sf packagetmap packagetidycensus packageStandards:
library(knitr)
Warning: package ‘knitr’ was built under R version 4.2.1
library(tidyverse)
Warning: package ‘tidyverse’ was built under R version 4.2.1
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
── Attaching packages ──────────────────────────────────────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6 ✔ purrr 0.3.5
✔ tibble 3.1.8 ✔ dplyr 1.0.10
✔ tidyr 1.2.1 ✔ stringr 1.4.1
✔ readr 2.1.3 ✔ forcats 0.5.2
Warning: package ‘ggplot2’ was built under R version 4.2.1
Warning: package ‘tibble’ was built under R version 4.2.1
Warning: package ‘tidyr’ was built under R version 4.2.1
Warning: package ‘readr’ was built under R version 4.2.1
Warning: package ‘purrr’ was built under R version 4.2.1
Warning: package ‘dplyr’ was built under R version 4.2.1
Warning: package ‘stringr’ was built under R version 4.2.1
Warning: package ‘forcats’ was built under R version 4.2.1
── Conflicts ─────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
library(janitor)
Warning: package ‘janitor’ was built under R version 4.2.1
Attaching package: ‘janitor’
The following objects are masked from ‘package:stats’:
chisq.test, fisher.test
library(lubridate) # because we will probably see some dates
Warning: package ‘lubridate’ was built under R version 4.2.1
Attaching package: ‘lubridate’
The following objects are masked from ‘package:base’:
date, intersect, setdiff, union
library(here) # a package I haven't taught you about before that doesn't do much, but ....
Warning: package ‘here’ was built under R version 4.2.1
here() starts at U:/DS 241/ds241_f22/analysis/work
Some additional packages focused on today’s work:
library(sf) # working with simple features - geospatial
Warning: package ‘sf’ was built under R version 4.2.2
Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
library(tmap)
Warning: package ‘tmap’ was built under R version 4.2.2
Registered S3 method overwritten by 'htmlwidgets':
method from
print.htmlwidget tools:rstudio
library(tidycensus)
Warning: package ‘tidycensus’ was built under R version 4.2.2
sf: https://r-spatial.github.io/sf/articles/tmap: https://cran.r-project.org/web/packages/tmap/vignettes/tmap-getstarted.htmltidycensus package: https://walker-data.com/tidycensus/index.htmltidycensus : https://walker-data.com/census-r/index.htmlOur first data source comes from opendata.dc
https://opendata.dc.gov/datasets/DCGIS::dc-health-planning-neighborhoods/about
I will use the GeoJSON file. (Newer, not necessarily better, but … a single file. Not smaller, but … this one is not big.)
Data is easily readable
neigh=st_read(here("DC_Health_Planning_Neighborhoods.geojson")) %>% clean_names()
Reading layer `DC_Health_Planning_Neighborhoods' from data source
`U:\DS 241\ds241_f22\analysis\work\DC_Health_Planning_Neighborhoods.geojson' using driver `GeoJSON'
Simple feature collection with 51 features and 8 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -77.11976 ymin: 38.79165 xmax: -76.9094 ymax: 38.99556
Geodetic CRS: WGS 84
class(neigh)
[1] "sf" "data.frame"
plot(neigh)
df1=tibble(fruit=c("apple","banana","cherry"),cost=c(1.5,1.2,2.25))
df2=tibble(fruit=c("apple","apple","cherry","lemon"),
desert=c("pie","cobbler","cobbler","cheesecake"),
cal=c(400,430,500,550))
df1
df2
left_join(df1,df2,by="fruit")
Covid case information is available from opendatadc:
https://opendata.dc.gov/datasets/DCGIS::dc-covid-19-total-positive-cases-by-neighborhood/about
Read cases information:
df_c=read_csv(here("DC_COVID-19_Total_Positive_Cases_by_Neighborhood.csv")) %>% clean_names()
Rows: 26372 Columns: 4
── Column specification ─────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (2): DATE_REPORTED, NEIGHBORHOOD
dbl (2): OBJECTID, TOTAL_POSITIVES
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_cases=df_c %>%
filter(as_date(date_reported) == "2022-02-22") %>%
separate(neighborhood,into=c("code","name"),sep = ":") %>%
mutate(code=case_when(
code=="N35" ~"N0",
TRUE ~ code
)) %>%
select(-objectid,-date_reported)
neigh2=left_join(neigh,df_cases,by=c("code"))
tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(neigh2) +tm_polygons("total_positives",alpha=.5)
Let’s get some data using tidycensus. Need an API key https://api.census.gov/data/key_signup.html
census_api_key("00b78ce52463bf386a260d23ec58edb622e6d3ac")
To install your API key for use in future sessions, run this function with `install = TRUE`.
#what variables
v20 = load_variables(2018,"acs5")
# median_family_income=" B06011_001"
# all "B00001_001"
#black "B02009_001"
Get some data:
df_cencus=get_acs(geography = "tract",
variables=c("median_inc"="B06011_001",
"pop"="B01001_001",
"pop_black"="B02009_001"),
state="DC",geometry=TRUE,year=2018)
Getting data from the 2014-2018 5-year ACS
Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
Using FIPS code '11' for state 'DC'
|
| | 0%
|
|========================================== | 41%
|
|=======================================================================================================| 100%
class(df_cencus)
[1] "sf" "data.frame"
plot(df_cencus)
It’s in long format. Let’s make it wide.
df_cens=df_cencus %>% select(-moe) %>% spread(variable,estimate)
tm_shape(df_cens) +tm_polygons("median_inc",alpha=.5)
tm_shape(neigh2) +tm_borders(col="blue",lwd=5,alpha=.2)+
tm_shape(df_cens) +tm_borders(col="red",lwd=1,alpha=.3)
#<<<<<<< HEAD
#df_j=st_join(df_cens,neigh2)
#=======
#df_j=st_join(df_cens,neigh2,prepared=FALSE)
#>>>>>>> aaf01be5cf721819dd2df615aef7a1999bcec0c2
df_cens_adj=df_cens %>% st_transform(4326)
df_j=st_join(df_cens_adj,neigh2,largest=TRUE)
Warning: attribute variables are assumed to be spatially constant throughout all geometries
Other order?:
#<<<<<<< HEAD
##df_j_rev = st_join(neigh2,df_cens_adj,largest=TRUE)
#=======
#df_j_rev = st_join(neigh2,df_cens_adj,largest=TRUE)
#>>>>>>> aaf01be5cf721819dd2df615aef7a1999bcec0c2
Since we want the geometry for the NEIGHBORHOODS, we need a to work a little harder:
df1=df_j %>% select(median_inc,pop,pop_black,objectid) %>%
group_by(objectid) %>%
summarise(pop_n=sum(pop),
pop_black_n=sum(pop_black),
adj_median_income=sum(pop*median_inc)/pop_n)
plot(df1)
#df2=left_join(neigh2,df1)
df2=left_join(neigh2,df1 %>% st_set_geometry(NULL))
Joining, by = "objectid"
df2=df2 %>% mutate(black_perc=pop_black_n/pop_n, covid_rate=total_positives/pop_n)
tm_shape(df2)+tm_polygons(c("adj_median_income","covid_rate","black_perc"))
df2 %>% filter(objectid!=30) %>% tm_shape()+tm_polygons(c("adj_median_income","covid_rate","black_perc"),alpha=.4)